From Bayesian epistemology to inductive logic
نویسنده
چکیده
Inductive logic admits a variety of semantics (Haenni et al., 2011, Part 1). This paper develops semantics based on the norms of Bayesian epistemology (Williamson, 2010, Chapter 7). §1 introduces the semantics and then, in §2, the paper explores methods for drawing inferences in the resulting logic and compares the methods of this paper with the methods of Barnett and Paris (2008). §3 then evaluates this Bayesian inductive logic in the light of four traditional critiques of inductive logic, arguing (i) that it is language independent in a key sense, (ii) that it admits connections with the Principle of Indifference but these connections do not lead to paradox, (iii) that it can capture the phenomenon of learning from experience, and (iv) that while the logic advocates scepticism with regard to some universal hypotheses, such scepticism is not problematic from the point of view of scientific theorising. §1 Bayesian Epistemology as Semantics for Inductive Logic This section introduces the use of Bayesian epistemology as semantics for inductive logic. The material presented here is based on Williamson (2010), to which the reader is referred for more details. ¶ Bayesian Epistemology: A Primer. At root, Bayesian epistemology concerns the question of how strongly one should believe the various propositions that one can express. The Bayesian theory that answers this question can be developed in a number of ways, but it is usual to base the theory on the betting interpretation of degrees of belief. According to the betting interpretation, one believes proposition θ to degree x iff, were one to offer a betting quotient for θ—a number q such that one would pay qS to receive S in return should θ turn out to be true, where unknown stake S ∈R may depend on q—then q = x. This interpretation of degrees of belief naturally goes hand in hand with the claim that, were one to bet according to one’s degrees of belief via the above betting set-up, then one shouldn’t expose oneself to avoidable losses. In particular, arguably one’s degrees of belief should minimise worst-case expected loss. This starting point—the betting interpretation together with the loss-avoidance claim—can then be used to motivate various rational norms that answer the main question facing Bayesian epistemology, i.e., that specify how strongly one should believe the various propositions that one can express. Since the context of this paper is inductive logic, we will be particularly concerned with propositions expressed in a logical language. Suppose then that Ln is a propositional language on elementary propositions A1, . . . , An, with SLn the set of propositions formed by recursively applying the usual logical connectives to the elementary propositions. Let Ωn be the set of atomic states of Ln, i.e., propositions ωn of the form ±A1 ∧ ·· · ∧ ±An, where ±A i is either A i or its negation. One norm of rational belief, which we shall call the Probability Norm, says that one’s degrees of belief should satisfy the axioms of probability, for otherwise, in the worst case, stakes are chosen that ensure positive expected loss—equivalently, stakes are chosen that ensure positive loss whichever atomic state turns out to be true (a so-called Dutch book). Thus degrees of belief should be probabilities in order to minimise worst-case expected loss: Theorem 1. Define function P : SLn −→ R by P(θ) = a given agent’s betting quotient for θ. The agent’s bets on propositions expressible in Ln avoid the possibility of a Dutch book if and only if they satisfy the axioms of probability: P1. P(ωn)≥0 for each ωn ∈Ωn, P2. P(τ)= 1 for some tautology τ ∈ SL , P3. P(θ)=∑ωn|=θ P(ωn) for each θ ∈ SL . See Williamson (2010, Theorem 3.2) for a proof. This is known as the Dutch Book Theorem, or the Ramsey-de-Finetti Theorem. P1–3 offer one way of expressing the axioms of probability over the propositional language Ln: this axiomatisation makes it clear that a probability function is determined by its values on the atomic states of Ln. A second norm, the Calibration Norm or Principal Principle, says that one’s degrees of belief should be calibrated to known physical probabilities. In particular, if one knows just the physical probability P∗(θ) of θ then one should set one’s betting quotient for θ to this physical probability, P(θ)= P∗(θ), for otherwise in the worst case stakes will be chosen that render the expected loss positive.1 While in the case of the Probability Norm, minimising worst-case expected loss is equivalent to avoiding sure loss (a Dutch book), in this case the link appears in a long run of bets rather than in the single bet on θ itself: if one were repeatedly to bet on θ-like events with the same betting quotient for each bet, then one would be susceptible to sure loss in the long run (Williamson, 2010, pp. 39–42). Of course in general one might know only of certain constraints on physical probability, rather than individual physical probabilities such as P∗(θ). In such a case the Calibration Norm would say that if one knows that the physical probability function P∗ lies within some set P∗ of probability functions, then one’s belief function P should lie within the convex hull 〈P∗〉 of this set of probability functions. The reason being that if one remains in the convex hull one’s bets do not have demonstrably greater than the minimum worst-case expected loss (equivalently, one can’t be forced to lose money in the long run), but outside the convex hull one can be 1For simplicity of exposition we take physical probability P∗ to be single-case here, defined over the propositions of the agent’s language L . But we could instead take physical probability to be generic, defined over repeatedly instantiatable outcomes, and then consider in place of P∗ the single-case consequences of physical probability, i.e., the constraints that known generic physical probabilities impose on the agent’s degree of belief. See Williamson (2011c) on this point. sure of sub-optimal expected loss (equivalently, one can be forced to lose money in the long run). A third norm, the Equivocation Norm, says that one should not adopt extreme degrees of belief unless forced to by the Probability Norm or the Calibration Norm: one’s degrees of belief should be equivocate sufficiently between the basic possibilities that one can express (i.e., between the atomic states of Ln). Equivalently, one’s belief function P should satisfy the constraints imposed by the other norms and otherwise should be sufficiently close to the equivocator function that gives the same probability to each of the 2n atomic states, P=(ωn) = 1/2n for each ωn ∈Ωn. (Distance between probability functions is measured by Kullback-Leibler divergence, d(P,Q)=∑ωn∈Ωn P(ωn) logP(ωn)/Q(ωn).) Again this norm can be justified by minimising worst-case expected loss; the argument goes as follows (Williamson, 2010, p. 63–65). In the absence of knowledge about one’s losses, one should take the loss function to be logarithmic by default, i.e., one should assume that one will lose − logP(ωn) where ωn is the atomic state that turns out true, for such a loss function is the only one that satisfies various desiderata that are natural to impose on a default loss function. But then, under some rather general conditions, the P that satisfies constraints imposed by other norms but minimises worst-case expected loss (the robust Bayes choice of P) is the P that is closest to the equivocator. Note that we need the qualification that degrees of belief should be ‘sufficiently’ close to the equivocator in order to handle the case where there is no function closest to the equivocator. For example, one might know that a coin is biased in favour of heads, so that P∗(H) > 1/2 where H signifies heads at the next toss. Arguably then by the Calibration Norm one ought to believe H to degree greater than 1/2, P(H) > 1/2. But there is no degree of belief greater than 1/2 that is closest to P=(H) = 1/2. Therefore the most one can expect of an agent is that P(H) is sufficiently close to 1/2, where what counts as sufficiently close depends on pragmatic considerations such as the required accuracy of calculations. In sum, the betting interpretation together with the loss-avoidance claim lead to three norms: Probability, Calibration and Equivocation. Bayesian epistemologists disagree as to whether to endorse all these norms. All accept the Probability Norm, most accept some version of Calibration Norm, but few accept the Equivocation Norm. The Probability Norm on its own leads to what is sometimes called strict subjectivism, Probability together with Calibration leads to empirically-based subjectivism and all three norms taken together lead to objectivism. Note that strict subjectivism doesn’t imply that determining appropriate degrees of belief is entirely a question of subjective choice, inasmuch as the Probability Norm imposes substantial constraints on which degrees of belief count as rational. Neither does objectivism imply that rational degrees of belief are totally objective in the sense of being uniquely determined—there may remain an element of subjective choice, as in the biased coin example introduced above, where the agent is free to choose between any sufficiently equivocal probability function. In the light of the fact that the three norms have essentially the same motivation, it is hard to argue that this motivation warrants one norm but not another. Accordingly we will accept all three norms in what follows, exploring the consequences of objective Bayesian epistemology for inductive logic. ¶ Predicate Languages. Thus far we have introduced Bayesian epistemology in the context of a propositional language. But the framework extends quite naturally to a predicate language, as we shall now see. Suppose then that L is a firstorder predicate language without equality, with finitely many predicate symbols and with countably many constant symbols t1, t2, . . ., one for each element of the domain. For n≥1, let Ln be the finite predicate language involving only constants t1, . . . , tn. Let A1, A2, . . . run through the atomic propositions of L , i.e., propositions of the form Ut where U is a predicate symbol and t is a tuple of constant symbols of corresponding arity. Order the A1, A2, . . . as follows: any atomic proposition expressible in Ln but not expressible in Lm for m < n should occur later in the ordering than those atomic propositions expressible in Lm. Let A1, . . . , Arn be the atomic propositions expressible in Ln. An atomic n-state ωn is an atomic state ±A1 ∧·· ·∧±Arn of Ln. Let Ωn be the set of atomic n-states. While the Calibration Norm remains the same as in the propositional case, the Probability Norm needs modification because the predicate language requires an extra axiom of probability: Theorem 2. Define function P : SL −→ R by P(θ) = the agent’s betting quotient for θ. The agent’s bets on propositions expressible in L avoid the possibility of a Dutch book if and only if P is a probability function: PP1. P(ωn)≥0 for each ωn ∈Ωn and each n, PP2. P(τ)= 1 for some tautology τ ∈ SL , PP3. P(θ)=∑ωn|=θ P(ωn) for each quantifier-free proposition θ, for any n large enough that Ln contains all the atomic propositions occurring in θ, and PP4. P(∃xθ(x))= supm P (∨m i=1θ(ti) ) . See Williamson (2010, Theorem 5.1) for a proof. PP4, the extra probability axiom, is known as Gaifman’s condition. PP1–4 imply that P(∃xθ(x))= limm→∞ P (∨m i=1θ(ti) ) and P(∀xθ(x))= limm→∞ P (∧m i=1θ(ti) ) . As in the propositional case, a probability function is determined by its values on the atomic n-states, but in this case n varies over the natural numbers. The Equivocation Norm carries over from the propositional case as follows. Define the equivocator by P=(ωn)= 1/2rn for all n and ωn. Define the n-divergence between P and Q to be dn(P,Q) = ∑ωn∈Ωn P(ωn) logP(ωn)/Q(ωn). Say that P is closer to R than Q if there is some N such that for all n≥N, dn(P,R) < dn(Q,R). The Equivocation Norm says that P should be, from all the functions that satisfy the constraints imposed by the Probability and Calibration Norms, one that is sufficiently close to the equivocator. As in the propositional case it is a pragmatic question as to what counts as ‘sufficiently’ close to the equivocator. ¶ Inductive Logic. Following Haenni et al. (2011), we will concern ourselves with inductive logics that yield entailment relationships of the form: φ X1 1 , . . . ,φ Xk k |≈ψY . Here φ1, . . . ,φk,ψ are propositions of some given logical language, and the superscripts X1, . . . , Xk,Y denote inductive qualities that attach to these propositions— characterising, e.g., their certainty, plausibility, reliability, weight of evidence, or probability. Such an entailment relationship can be read: if X1 attaches to φ1, . . ., and Xk attaches to φk then Y attaches to ψ. As to which entailment relationships hold depends very much on the semantics for the entailment relation |≈. Normally some given semantics will say something like, ‘the entailment relationship holds if all interpretations that satisfy the premisses φ X1 1 , . . . ,φ Xk k also satisfy the conclusion ψ Y ’, and will say what an interpretation is and what it is to satisfy the premisses and satisfy the conclusion. In a probabilistic logic, or progic for short, X1, . . . , Xk,Y are sets of probabilities and interpretations are probability functions. Still, there are a wide range of semantics one might give for a progic: the standard semantics says that P satisfies θZ if and only if P(θ) ∈ Z, but other semantics are provided by the theories of probabilistic argumentation (which is closely related to Dempster-Shafer theory), Kyburg’s evidential probability, classical statistical inference, Bayesian statistical inference and Bayesian epistemology (Haenni et al., 2011, Part I). Bayesian epistemology provides semantics in the following way. First, one interprets the logical language in which φ1, . . . ,φk,ψ are expressed as the language of an agent. Then one can construe the premisses φ1 1 , . . . ,φ Xk k as constituting the agent’s evidence of physical probabilities: they say that P∗(φ1) ∈ X1, . . . ,P∗(φk) ∈ Xk. Finally, interpretations are belief functions, which, by the Probability Norm, are probability functions. By the Calibration Norm a belief function P satisfies the premisses iff it lies in the convex hull of all probability functions that satisfy the relevant evidential constraints, i.e., iff P ∈ 〈P∗〉. Now if the premisses are consistent we can simply take P∗ = {P : P(φ1) ∈ X1, . . . ,P(φk) ∈ Xk}. But if the premisses are inconsistent we cannot identify P∗ = ;: inconsistent premisses tell us not that there is no chance function but rather that there is something wrong with the premisses. In this case, then, some consistency maintenance procedure needs to be employed. The simplest such procedure is to take P∗ to include any function that lies in a maximal consistent subset of {P : P(φ1) ∈ X1, . . . ,P(φk) ∈ Xk}. Of course, if a set of constraints is consistent then the maximal consistent subset of that set is that set itself, so whether or not the premisses are consistent, we can take P∗ = ⊎{φX1 1 , . . . ,φk k } df = {P : P satisfies some maximal consistent subset of {φ1 1 , . . . ,φ Xk k }}. The conclusion ψ Y is interpreted as an assertion about rational degree of belief rather than physical probability: P(ψ) ∈Y . The norms dictate that a belief function that satisfies the premisses also satisfies the conclusion iff P(φ) ∈Y for all those P ∈ 〈⊎{φX1 1 , . . . ,φk k }〉 that are sufficiently equivocal. It remains to say what counts as ‘sufficiently’ equivocal. From an epistemological point of view this depends on the context—if probabilities are only needed to 2 decimal places then this leeway can help determine what counts as sufficiently equivocal. But from the logical perspective there may simply be no contextual information available: an entailment relationship such as φ1 1 , . . . ,φ Xk k |≈ ψY says nothing on its own about considerations such as accuracy. Nevertheless, there are certain things we can say about what can count as sufficiently equivocal, as we shall now see. Let E denote the set of probability functions that satisfy constraints imposed by the agent’s evidence. In the above exposition of the Calibration Norm, E = 〈P∗〉.2 Let ↓E denote the set of functions in E that are maximally equivocal (this set may be 2This version of the Calibration Norm presumes that all constraints on degrees of belief are mediated by evidence of physical probabilities. While this is an appropriate assumption in the context of our semantics for inductive logic, where premisses are interpreted exclusively in terms of physical probabilities, this assumption may be violated in other contexts (Williamson, 2010, §3.3). empty, as in the biased coin example above) and let ⇓E denote the set of functions in E that are sufficiently equivocal. Arguably, E1 : ⇓E 6= ;. An agent is always entitled to hold some beliefs. E2: ⇓E ⊆ E. Probability functions calibrated with evidence that are sufficiently equivocal are calibrated with evidence. E3 : If Q ∈ ⇓E and R ∈ E is no less equivocal than Q then R ∈ ⇓E. I.e., if Q is sufficiently equivocal then so is any function in E that is no less equivocal. E4: If ↓E 6= ; then ⇓E= ↓E. If it is possible to be maximally equivocal then one’s belief function should be maximally equivocal. E5 : ⇓⇓E = ⇓E. Any function, from those that are calibrated with evidence, that is sufficiently equivocal, is a function, from those that are calibrated with evidence and are sufficiently equivocal, that is sufficiently equivocal. In the case of a propositional language, for example, these conditions imply that: ⇓E= { ↓E : ↓E 6= ; {P ∈ E : |d(P,P=)−d(P̄,P=)| ≤ 2} : otherwise , for some 2, where {P̄}= ↓〈E〉, i.e., where P̄ is the unique function that is maximally equivocal from all those in the convex closure 〈E〉 of E. In general, if there are no contextual factors available to determine parameters like 2 that demarcate between sufficiently and insufficiently equivocal, the only option is to set: ⇓E= { ↓E : ↓E 6= ; E : otherwise . We shall take this as the default option in the case of the Bayesian semantics for inductive logic. We shall use the symbol |≈◦ to signify this Bayesian entailment relation, which can thus be characterised as follows: φ1 1 , . . . ,φ Xk k |≈◦ψY if and only if P(ψ) ∈Y for each maximally equivocal member P of 〈⊎{φX1 1 , . . . ,φk k }〉, if there are any maximally equivocal members, or for every member P of 〈⊎{φX1 1 , . . . ,φk k }〉 otherwise. §2 Inferences in Bayesian Inductive Logic Perhaps the best way to get an intuitive feel for the above inductive logic is via a series of examples. In this section we will use examples to demonstrate how inferences can be performed in Bayesian inductive logic, and to compare this logic with a closely related logic. ¶ Categorical Premisses in Propositional Inductive Logic. Consider the following invalid argument in propositional logic:
منابع مشابه
Objective Bayesian Epistemology for Inductive Logic on Predicate Languages
Main Objective. The main aim of this work is to provide a new justification of the three norms of objective Bayesian epistemology: that degrees of belief should be (i) probabilities, (ii) calibrated to evidence of physical probabilities, and (iii) sufficiently equivocal or non-extreme. While these norms are typically each justified in different ways, it is shown that they can be given a unified...
متن کاملJames Hawthorne Giving up Judgment Empiricism : The Bayesian Epistemology of Bertrand Russell and Grover Maxwell
Human Knowledge: Its Scope and Limits was first published in 1948. The view on inductive inference that Russell develops there has received relatively little careful study and, I believe, has been largely misunderstood. Grover Maxwell was one of the few philosophers who understood and carried forward the program of Russell's later work. Maxwell considered Human Knowledge to be one of Russell's ...
متن کاملJames Hawthorne Giving up Judgment Empiricism : The Bayesian Epistemology of
Human Knowledge: Its Scope and Limits was first published in 1948. The view on inductive inference that Russell develops there has received relatively little careful study and, I believe, has been largely misunderstood. Grover Maxwell was one of the few philosophers who understood and carried forward the program of Russell's later work. Maxwell considered Human Knowledge to be one of Russell's ...
متن کاملHow DoesStrawson Unify Epistemology, Ontology and Logic
Strawson’s conception of analysis as a ‘connective linguistic analysis’ makes it possible for him to achieve an indefinitely large range of ideas or concepts among them are certain numbers of fundamental, general and pervasive concepts or concept-types which not only are pre-theoretical or ahistorical, but also together constitute a structural framework only within whichlogic, ontology and epis...
متن کاملPollock on Probability in Epistemology
John Pollock did a lot of interesting and important work on the metaphysics and epistemology of probability over several decades. In Thinking About Acting [15], we find many fascinating and thought provoking ideas and arguments (both old and new) about probability. Owing to limitations of space, I will be confining my remarks to a handful of issues addressed in [15] pertaining to probability, l...
متن کاملFEW 2009 Special Issue: Preface
The Annual Formal Epistemology Workshop (FEW) is now entering its seventh year. I am very happy to report that the first six FEWs have been a great success (see http://fitelson.org/few/ for detailed information about past and current FEWs). We have managed to foster a thriving community of students, professors, and other researchers with a wide range of overlapping interests. And, this year, FE...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Applied Logic
دوره 11 شماره
صفحات -
تاریخ انتشار 2013